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ABSTRACT 

Human mitochondrial DNAs (mtDNAs) from 153 independent samples encompassing seven Asian 
populations were surveyed for sequence variation using the polymerase chain reaction (PCR), restric- 
tion endonuclease analysis and oligonucleotide hybridization. All Asian populations were found to 
share two ancient AluI/Ddel polymorphisms at nps 10394 and 10397 and to be genetically similar 
indicating that they share a common ancestry. The greatest mtDNA diversity and the highest 
frequency of mtDNAs with Hpal/Hincll morph | were observed in the Vietnamese suggesting a 
Southern Mongoloid origin of Asians. Remnants of the founding populations of Papua New Guinea 
(PNG) were found in Malaysia, and a marked frequency cline for the COII/tRNA™ intergenic 
deletion was observed along coastal Asia. Phylogenetic analysis indicates that both insertion and 
deletion mutations in the COII/tRNA‘* region have occurred more than once. 


REHISTORIC migrations in Southeast Asia have 
been the subject of much speculation (ISKANDAR 
1976; BELLwoop 1979, 1985; TURNER 1987; ZHAO 
and LEE 1989). Using dental morphological traits, 
TURNER (1983, 1987) hypothesized that two migra- 
tions originated from central China about 20,000- 
30,000 YBP (years before present). One group, the 
Sinodonts, expanded northward into China, Siberia 
and across the Bering land bridge to the New World. 
The second group, the Sundadonts, moved southward 
into Southeast Asia and Indonesia, and later through 
Melanesia, Micronesia, and Polynesia. 

Using linguistic comparisons, BELLWooD (1985) 
proposed two major prehistoric migrations into 
Southeast Asia. The first was an ancient “Australoid” 
migration from the Indo-Malaysian Archipelago 
which settled Australia and New Guinea about 40,000 
YBP. The second was the more recent “Southern 
Mongoloid” or “Austronesian” migration that origi- 
nated from the Fujian or Zhejian provinces of contem- 
porary China and settled throughout much of island 
and mainland Southeast Asia about 4,000-6,000 
YBP. 

It is believed that the remnant Australoid popula- 
tions in Southeast Asia were essentially replaced or 
assimilated by the southern Mongoloid migration 
(BELLWOOD 1985), whereas the major Australoid pop- 
ulations in Australia and New Guinea were relatively 
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unaffected by the second migration (TURNER 1987). 
Assuming that these interpretations are correct, we 
anticipate that Southeast Asia may be a clinal zone 
between the Australoid (south and east) and Mongo- 
loid (north and west) genotypes. 

We have investigated the regional sequence varia- 
tion of mtDNAs from Southeast Asian populations. 
By virtue of its matrilineal transmission (GILES et al. 
1980; CAsE and WALLACE 1981) and high mutation 
rate (BROWN, GEORGE and WILSON 1979; MIYATA e¢ 
al. 1982; WALLACE et al. 1987), the mtDNA rapidly 
accumulates sequence changes along radiating female 
lineages, thus providing a detailed record of ancient 
migration patterns. 

Previous analyses of mtDNAs from the Southeast 
Asian Aeta (Philippine Negritos) and other Asians 
[Japanese, Ainu (northern Japanese) and Koreans] has 
confirmed the Mongoloid affinity of these populations 
(HARIHARA et al. 1988). A detailed analysis of coastal 
and highland PNG mtDNAs has also confirmed the 
Asian association of these populations (STONEKING et 
al. 1990). Moreover, the coastal and highland PNG 
populations have become genetically differentiated 
(STONEKING et al. 1990), with coastal populations hav- 
ing the 9-bp COII/tRNA™ intergenic deletion (CANN 
and WILSON 1983; WRISCHNIK et al. 1987) in about 
40% of their mtDNAs while the highland populations 
lack this marker (HERTZBERG et al. 1989; STONEKING 
et al. 1990). This marker is also associated with Pacific 
coastal and island populations, appearing at high fre- 
quencies in Melanesia and Polynesia, reaching fixation 
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(100%) on some islands (HERTZBERG et al. 1989). 

In an effort to integrate these Asian mtDNA studies 
into a coherent view of Southern Mongoloid migra- 
tions, we have conducted a detailed analysis of the 
mtDNAs from seven East Asian populations. The data 
provide evidence that: (1) the Vietnamese are the most 
diverse and, hence, the oldest population; (2) Malay- 
sians retain remnants of haplotypes found in PNG; (3) 
coastal Asians have a striking frequency cline for the 
9-bp deletion; and (4) both insertion and deletion 
mutations in the COII/tRNA™ intergenic region 
have occurred more than once. 


MATERIALS AND METHODS 


Populations: Asian blood samples were collected from 
153 independent maternal pedigrees (unrelated through at 
least one generation). These included 14 Malaysian Chinese 
descendant from the Fujian/Guangdong region of south 
China; 14 Malays and 32 Malay Aborigines or “Orang Asli” 
{7 Temiar, 5 Semai, 1 Jakun, 2 Jeni, and 17 others of 
unidentified tribal origin] from the Malay peninsula; 30 
Aborigines [24 Kadazan (Dusun), 2 Berungei, 3 Rungus, 1 
Murut], and 2 Bisaya (Northern Borneo) from Sabah state 
(Borneo), Malaysia; 20 Han Chinese from Taiwan originat- 
ing from central (Hunan) China; 28 Vietnamese; and 13 
Koreans from South Korea (Seoul, Taejon, and Tamyang). 

Methods: DNA was extracted from platelets of lympho- 
blastoid cell pellets and buffy coats (WALLACE, GARRISON 
and KNOWLER 1985). The mtDNA for each sample was 
PCR amplified (Saiki et al. 1985) with AmpliTaqI polym- 
erase (Perkin Elmer-Cetus) into 9 overlapping fragments 
that encompassed the entire mtDNA molecule (SCHURR et 
al. 1990) (APPENDIX A). All PCR fragments were digested 
with 18 restriction endonucleases [AluI, Avall, BamHI, 
Ddel, Haell, Haelll, Hhal, Hincll, Hinfl, Hpal, Hpall, 
Mbol, Pstl, Puull, Rsal, Tagl, Xbal, Xhol1] and electropho- 
resed on 1.0-4.0% NuSieve + 1.0% SeaKem agarose gels 
(FMC Bio-Products) containing 1 »g/ml ethidium bromide, 
to determine their respective restriction patterns through 
UV fluorescence. 

In addition, each sample was screened for the COII/ 
tRNA™ length mutations (CANN and WILSON 1983; WRIS- 
CHNIK et al. 1987) by differential oligonucleotide hybridi- 
zation (SCHURR et al. 1990). The results were confirmed by 
restriction analysis. 

Sequence divergence: Inter- and intrapopulational com- 
parisons of mtDNA haplotype divergences were estimated 
using maximum likelihood estimates based upon nucleotide 
counting (NEI and TajiMa 1983), using the computer pro- 
gram DREST (generously provided by L. Jin). This proce- 
dure considers the ratio of shared sites to the total number 
of sites between two haplotypes, and the mean length of the 
restriction enzyme recognition sequences to calculate an 
initial estimate of x (the probability that the two mtDNAs 
have different nucleotides at a given nucleotide position). 
Using this initial estimate, a is solved iteratively using Equa- 
tion 28 and sequence divergence (4) estimated by Equation 
21 (Ner and Tajima 1983). 

Phylogenetic analysis: Phylogenetic trees were inferred 
from the restriction site data under the principle of maxi- 
mum parsimony using PAUP (Version 3.0m, Swofford 
1990). A variety of rooting techniques using hypothetical 
ancestors inferred from hypothesized African haplotypes 
[e.g., Ancestor “a” from CANN, STONEKING and WILSON 


[e.g., Ancestor “a” from CANN, STONEKING and WILSON 
(1987)] and Caucasian haplotypes (unpublished data) were 
used to determine genealogical relationships of the Asian 
mtDNA haplotypes. All methods produced similar results. 


RESULTS 


Restriction analysis: A total of 191 polymorphic 
sites and 106 haplotypes were observed in these 
mtDNAs (APPENDIX 8B), with an average of 390 inde- 
pendent sites per genome or approximately 10% of 
the mtDNA sequence being screened. Only 8 haplo- 
types (25, 28, 33, 51, 54, 55, 62 and 83) were found 
to be shared between 2 or more populations. Twenty 
“haplotype groups” were identified, distinguished by 
shared polymorphisms. Table 1 lists each sample and 
its corresponding haplotype and haplotype group. Fig- 
ure | shows the distribution and frequencies of these 
haplotype groups in Southeast Asia. Of all the haplo- 
type groups, “D” was the most prevalent type ob- 
served, with “A,” “E,” and “F” the next most frequent. 
These four groups are distributed in nearly every 
population sampled. 

Differences at Hincli/Hpal sites also revealed sev- 
eral informative groupings of haplotypes. MtDNAs 
with the HinclI/Hpal site loss at np 12406 [Hpal/ 
Hincll morph-1 (DENARO et al. 1981; BLANC et al. 
1983); group A] occurred at the highest frequencies 
in the Vietnamese and Malay Aborigines (32.1% and 
28.1%, respectively), and were found in almost all of 
the populations surveyed. A significant proportion of 
these mtDNAs (80.0%) also had a combined HaelI np 
9052/Hhal np 9053 site loss. The HincII np 207 site 
gain previously observed in PNG populations (STONE- 
KING, BHATIA and WILSON 1986; STONEKING é¢ al. 
1990) was also found in a Malay peninsula population 
(MMQ, haplotype 90). Another HinclI site loss at 
np 7853 [morph-5 (BLANC et al. 1983)] was seen in 
Vietnamese, Taiwanese Han and Sabah Aborigine 
mtDNAs and defined haplotype group B. In addition, 
several Sabah Aborigines had a new HincllI site loss at 
np 7937. Lastly, the Hincli np 12026 site gain in the 
Vietnamese [seen previously in one Australian aborig- 
ine (CANN, BROWN and WILSON 1984)] and the np 
1004 site loss observed in the Vietnamese and Tai- 
wanese Han [previously seen in one Japanese (HORAI, 
GojoporiI and MATSUNAGA 1984)] create separate 
haplotype groups P and Q, respectively. 

The combined Alul np 10397 and Ddel np 10394 
sites essentially split all haplotypes within these Asian 
populations into two major clusters (Figure 2). Several 
haplotypes had only the Ddel site, which creates a 
semisite for Alu at np 10397. Thus the Ddel site is 
not only necessary, but precedes the creation of the 
Alul site at np 10397. Both sites may be lost concur- 
rently via a single base substitution in the overlapping 
recognition sequences. 
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TABLE 1 


Sample haplotypes and haplotype groupings 


eer _e ee _<Q_Q$Q0W 


Sample Haplotype Group Sample Haplotype Group Sample Haplotype Group Sample Haplotype Group 
MCcO1 17 N VN28 53 A MM04 86 Oo KNO8 100 Cc 
MC02 18 N VN29 54 D MM05 87 E KNO9 101 D 
MC03 19 N VN30 55 Cc MM06 88 E KN10 102 G 
MC04 20 N MAOI 69 A MM07 62 F KN11 103 G 
MC05 21 F MAO2 70 F MM08 89 G KNI2 104 K 
MC06 22 E MAO3 71 A MMO9 90 S KNI3 105 M 
MCO07 23 T MA04 62 F MM10 91 A SA01 106 G 
MC08 24 Oo MAO05 62 F MM11 49 D SA02 107 B 
MC10 25 L MA06 72 D MM12 92 F SA03 108 E 
MCl1 26 F MA0O7 73 I MMI13 93 E SA04 109 G 
MC12 27 N MA08 74 D MM14 94 G SA05 110 D 
MCI13 25 L MA09 75 I Twol 56 H SA06 111 D 
MC14 28 H MALO 76 J TW02 28 H SA07 83 G 
MC15 29 ce) MAIL 77 F TWo03 54 D SA08 112 N 
VNOL 30 A MAI2 78 I TW04 57 D SA09 106 G 
VNO02 31 E MAI]3 33 A TW05 57 D SA10 113 E 
VNO04 32 Q MAI4 33 A TW06 58 D SAI1 114 E 
VNO5 33 A MAI5 33 A TW07 59 C SA12 114 E 
VN06 34 M MAI6 33 A TWO08 55 Cc SAI13 115 D 
VNO7 35 T MA17 75 I TW09 60 Cc SA14 112 N 
VNO8 36 Cc MAI8 72 D TWI1O0 61 A SA15 109 G 
VNO9 37 B MA19 79 D Twill 62 F SAI16 111 D 
VNIO 37 B MA20 33 A TW12 62 F SA17 106 G 
VNII 38 E MA2} 76 J TW1i3 25 L SA18 116 D 
VN12 39 A MA22 72 D Twil4 63 B SA19 117 E 
VNI13 40 F MA23 74 D TW15 64 F SA20 118 E 
VNI14 Al E MA24 62 F TWI16 65 R SA21 11] D 
VNI5 37 B MA25 33 A TW17 66 T SA22 109 G 
VNI16 42 B MA27 80 D TWI8 51 A SA23 106 G 
VN17 43 B MA28 62 F TW19 67 B BS24 115 D 
VNI8 44 P MA29 81 I TW20 68 Q SA25 119 G 
VNI19 45 A MA30 82 I KNO1 25 L SA26 54 D 
VN21 46 A MA31 82 I KNO2 95 F SA27 120 A 
VN22 47 A MA32 33 A KNO3 96 K SA28 121 D 
VN23 48 A MA33 72 D KN04 97 L SA29 115 D 
VN24 54 D MMOl 83 G KNO5 98 A SA30 114 E 
VN25 50 T MM02 84 D KNO6 99 A SA31 55 Cc 
VN26 51 A MMO03 85 A KNO?7 96 K SA32 122 E 
VN27 52 B 


Sample abbreviations: MC, Malaysian Chinese; VN, Vietnamese; MA, Malay Aborigines; MM, Malays; TW, Taiwanese Han; KN, Koreans; 
SA, Sabah Aborigines. Haplotype groupings were classified according to polymorphic sites that were shared within each group. The site 
gains and losses for each haplotype group relative to the published sequence (ANDERSON et al. 1981) are bold face and non-bold face, 
respectively; slashes between enzyme letters or sites indicate non-independent events. Restriction sites enclosed in brackets indicate sites that 
frequently accompany the definitive sites with that haplotype group. The letter designation of restriction enzymes are as follows: a, Alul; b, 
Avall; c, Ddel; e, Haelll; f, Hhal; g, Hinfl; h, Hpal; i, Hpall; j, Mbol; k, Rsal; 1, Tag]; m, BamHI; n, Haell; 0, Hincll. Haplotype groups are 
indicated by capital letters and consist of sets of polymorphic restriction sites: [A]-12406h/124060, 16517e, (9052n/9053f]; [B]-78530, 
10394c, 10397a; [C}]-3534c/3537a, 10394c, 152349/15235j, 16517e; [D]-16517e; [E]-10394c, 10397a, 16517; [F]-10394c, 10397a; [G]- 
16389g¢/16390b, 10394c, 10397a, 7598f, 16517e; (H]-663e, [16517e]}; [1]-10143a, 9326n/9329f, 10394c, 10397a, [951j, 1063e); [J}-4711i, 
11403g, 131180j, 1715c, 10394c; [K]-4830n/4831f, 10394c, 10397a; [L]-5176a, 10394c, 10397a; [M]-10394c, [16517e}; [N]-16517e; [O}- 
13366b/13367j/13368m, [16517e}; [P]-12026h/120260; [Q]-1002q/10040; [R]-132590/13261a, 10394c, 10397a, 16517e; [S]-207h/2070, 
15606a; [T]-16389g/16390b, 16517e. The COII/tRNA‘9-bp deletion occurs within haplotype groups A—D and F; the 4-bp addition occurs 
within haplotype groups A and M. 


Asian COII/LRNA™ intergenic length mutation 
haplotypes: The 9-bp COII/tRNA deletion (CANN 
and WILSON 1983; WRISCHNIK et al. 1987) was ob- 


D include 14 of the 17 deletion haplotypes (Table 1; 
APPENDIX B). Although these haplotype groups form 
2 distinct clades distinguished by the DdeI np 10394 


served in 25 individuals from all seven populations, 
comprising 16.3% of the samples within this study. 
Table 2 shows the overall distribution of the deletion 
in the populations analyzed. Haplotype groups C and 


site, they all have the same Hincll sites [morph-2 
(BLANC et al. 1983)], and can be derived from haplo- 
type 54. In contrast, the Vietnamese deletion haplo- 
type 43 shares several sites with Group B haplotypes 
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Malaysian Chinese 


Vietnamese 
Sabah Aborigines 
Malay Aborigines 


Malays 


Haplotype Group 


ABC DEFGHIJKLMNOPQRST n 


Koreans 20110111003210000000 13 
Malay 20023230000000100010 14 
Malay Aborigines 90080600720000000000 32 
Malaysian Chinese 00001201000205200001 14 
Sabah Aborigines 111%8090000002000000 32 
Taiwanese Han 22340302000100001101 20 
Vietnamese 96223100000010011002 28 


FIGURE 1.—Map of Southeast Asia Showing mtDNA Sample 
Localities. Haplotype groups are described in Table 1. The number 
of mtDNAs within each haplotype group are indicated below each 
corresponding letter, and the total number of individuals in each 
population indicated under “n.” 


and must have been derived from an unrelated 
mtDNA (i.e., haplotype 37), indicating that the COII/ 
tRNA deletion has occurred more than once. 

In addition, two individuals (VN6 and SA27) had 
an insertion of approximately 4-bp in the COII/ 
tRNA’ region, yet these haplotypes (34-Vietnamese 
and 120-Sabah Aborigine) are distinct and differ by 
at least 8 mutational events. Consequently, insertion 
mutations must also have occurred at least twice in 
Southeast Asian populations. 

Genetic divergence: Table 3 presents the genetic 
divergence estimates for intra- and interpopulational 
comparisons (NEI and TajiMA 1983). The highest 
intrapopulational divergence was observed within the 
Vietnamese at 0.00236 (0.236%). The lowest was 


within the Malay Aborigines and Taiwanese Han at 
0.148% and 0.145%, respectively. 

Phylogenetic analysis: A tree generated using a 
hypothesized ancestor (haplotype HYPANC) is pre- 
sented in Figure 2. The two branches of this tree are 
defined by the Ddel and Alu] sites at nps 10394 and 
10397, respectively. The majority of the deletion 
haplotypes cluster together (groups D+ and C) except 
for haplotypes 21, 43 and 61. Most of the distinct 
branches within the tree encompass multiple popula- 
tions, indicating that some haplotype groups (A-E 
and G) may represent common ancient Asian lineages. 
Overall there are few population-specific groupings 
of haplotypes within the network. One exception is 
haplotype group I in which five Malay aboriginal 
haplotypes (73, 75, 78, 81, 82) are associated. 

Combining the Southeast Asian mtDNA data with 
that of PNG indicates that specific haplotype group- 
ings are more characteristic of isolated populations. A 
tree of both populations (STONEKING et al. 1990) is 
presented in Figure 3. Two additional haplotype 
groups now stand out, group S, defined by the HincI1/ 
Hpal np 207 site gain, and group U, a subgroup of F. 
Like the Orang Asli group I, groups S and U are 
isolated, occurring predominately in highland PNG. 
Other PNG haplotypes are dispersed within the other 
Southeast Asian haplotype groups. These include type 
P150 which falls within group A and deletion types 
P119-—P130 which fall within group Ds. 


DISCUSSION 


Similarity of mongoloid types: Analysis of South- 
east Asian mtDNA variation indicates that all extant 
populations were derived from a common ancestral 
population which encompassed most of the variation. 
The mean of the intrapopulational divergence is 
0.182%, while the mean interpopulational divergence 
corrected for intrapopulational divergence (NEI and 
TAJIMA 1983) is about one-sixth this value or 0.030%, 
with a range of 0.019% to 0.053% (Table 3). Thus, 
it would appear that most of the mtDNA variation is 
shared between the Southeast Asian populations and 
predated the present geographic subdivision. Of the 
current populations, the Vietnamese have the greatest 
intrapopulational genetic divergence (0.236%) sug- 
gesting that it is the oldest. Since Vietnam was colo- 
nized by a southeast China migration, this would imply 
a southern Chinese origin of Mongoloid people about 
59,000 to 118,000 YBP (assuming that mtDNA di- 
vergence is 2-4% per million years, CANN, BROWN 
and WILSON 1984; CANN, STONEKING and WILSON 
1987; NECKELMANN ét al. 1987, 1989; WALLACE et al. 
1987). 

Haplotype group A (Table 1), which was present in 
six of the populations, further substantiates our pre- 
vious proposal that the HincII/Hpal morph 1 poly- 
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morphism has been associated with some of the earli- 
est Asian mtDNAs (BLANC et al. 1983). This haplotype 
group is most frequent in the Vietnamese, (32.1%) 
and the Malay Aborigines (28.1%). In light of their 
language affiliation, [Austro-Asiatic family (BELL- 
woop, 1979)] these populations seem to be derived 
from a common stock. MtDNAs from haplotype 
group A were also found in the Taiwanese Han 
(10.0%), Malays (14.3%), Koreans (15.4%), and Sabah 
Aborigines (3.1%) substantiating the early appearance 
of this haplotype group. 


The aboriginal populations of the Malay peninsula 
(Senoi and Proto-Malays) and Borneo (Sabah) show a 
degree of genetic substructure. The majority of Malay 
Aborigine samples were taken from the Senoi, a group 
of tribes living in the mountainous jungles of penin- 
sular Malaysia. This group is believed to have arrived 
with a “second wave” of migration occurring about 
4,000-8,000 YBP (BELLWoop 1985). It has been pos- 
tulated that the islands of Borneo and Indonesia re- 
ceived the spillover from this migration in Southeast 
Asia and the Malay peninsula (TAN et al. 1979). The 
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TABLE 2 


COII/tRNA™ 9-bp deletion frequencies 
TT ee 


% 


Population Na N 

Malaysian Chinese 1 14 
Malays 2 14 
Malay Aborigines 1 32 
Sabah Aborigines 6 32 
Taiwanese Han 8 20 
Vietnamese 5 28 
Koreans 2 13 
Coastal PNG 23 55 
Coastal PNG 4 28 
PNG Highlanders 0 64 
PNG Highlanders 0 30 
Aust. Aborigines 1 31 
Aust. Aborigines 0 20 
Japanese 19 116 
East Asians* 6 34 
Polynesians 139 150 
Fijians 23 28 
Amerindians 

Pima 14 31 

Maya 8 37 

Ticuna 0 31 


7.1 

14.3 

3.1 

18.75 
40.0 

17.9 

15.4 
41.8 (STONEKING et al. 1990) 

14.2 (HERTZBERG et al. 1989) 

0.0 (STONEKING ef al. 1990) 

0.0 (HERTZBERG ef al. 1989) 

3.2 (HERTZBERG et al. 1989) 

0.0 (CANN, STONEKING and WILSON 1987) 
16.4 (Horal and MATsuUNAGA 1986) 

17.6 (CANN, STONEKING and WILSON 1987) 
92.7 (HERTZBERG ef al. 1989) 
82.1 (HERTZBERG et al. 1989) 


45.2 (SCHURR et al. 1990) 
21.6 (SCHURR e al. 1990) 
0.0 (ScHURR et al. 1990) 


Malaysian Chinese, Malays, Malay Aborigines, Sabah Aborigines, Taiwanese Han, Vietnamese, and Korean populations represent the 
groups from this report. nz = no. of deleted mtDNAs; N = total sample size; % = percentage of deleted mtDNAs within a population. * East 
Asians = 1 Japanese, 1 Taiwanese, 1 Vietnamese, 2 Philippino and 1 Tongan. 


TABLE 3 


Percent sequence divergence 


MC MM MA SA TW VN KN 


MC 0.196 0.229 0.207 0.241 0.193 0.255 0.219 
MM 0.04 0.182 0.188 0.200 0.193 0.236 0.205 
MA 0.035 0.023 0.148 0.196 0.177 0.211 0.194 
SA 0.053 0.019 0.032 0.180 0.195 0.254 0.220 
TW 0.022 0.029 0.031 0,032 0.145 0.215 0.189 
VN 0.039 0.027 0.019 0.046 0.024 0.236 0.243 
KN 0.028 0.021 0.028 0.037 0.024 0.032 0.185 


Intrapopulational divergences are along the diagonal (under- 
lined), and interpopulational divergence and interpopulational di- 
vergences corrected for intrapopulational variation (NEI and Ta- 
Jma 1983) are above (bold type) and below (italics) the diagonal, 
respectively. 


Semai tribe of the Senoi had the unique group I 
haplotypes (73, 75, 78, 81, and 82) defined by an Alul 
np 10143 site gain and a HaelI/Hhal nps 9326/9329 
site gain. The Jeni (MA10, MA21) had haplotype 76 
(group J), with several previously unreported poly- 
morphisms. The populations of the Malay peninsula 
also showed close affinities to the Sabah Aborigines, 
sharing haplotype groups A, D, E and G. 

The Kadazan represent the largest ethnic group in 
Sabah and are thought to have originated from an 
Austronesian migration originating in South China 
(TAN et al. 1979). The other ethnic groups from Sabah 
(Berungei, Murut, Rungus) and the Northern Borneo 
Bisaya, had mtDNAs similar to the Kadazan. For the 


Bisaya (106, 122), this was unexpected since their 
mtDNAs did not resemble previously reported Philip- 
pino haplotypes (CANN 1982; CANN, STONEKING and 
WILSON 1987). This probably reflects the partial as- 
similation of these minority groups into the Kadazan. 

Papua New Guinea vs. Southeast Asia: There are 
several similarities between the haplotypes of PNG 
and those of the Malay Aborigines, Malays and Sabah 
Aborigines. Based on shared haplotype character 
states, the Southeast Asians appear closest to the 
coastal PNG populations. The combined Hincll/Hpal 
site gain at np 207 observed in coastal and highland 
PNG was found in our haplotype 90 (MM9), which is 
virtually identical to the PNG type 94 (group S, Fig- 
ures 2 and 3) (STONEKING et al. 1990). Additionally, 
the deletion haplotype found in Southeast Asia (hap- 
lotype 54, Figures 2 and 3) is also found in coastal 
New Guinea [type P119 (STONEKING et al. 1990)]. All 
other PNG deletion haplotypes fall within the South- 
east Asian haplotype group D*. The group A haplo- 
types frequently observed in the Malay Aborigines 
and Vietnamese are also found on the southern coast 
of New Guinea in type P150 (STONEKING et al. 1990). 
The presence of these Southeast Asian mtDNAs in 
coastal PNG is consistent with a postulated Southeast 
Asian origin of these populations (BELLWooD 1987; 
STONEKING ef al. 1990). Additional Sabah and Malay 
peninsula haplotypes shared with coastal PNG include 
haplotype 83 (haplotype group G) which is essentially 
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Figure 3.—A phylogeny of Southeast Asian and Papua New Guinea (PNG) mtDNA haplotypes. The rooting and branching algorithm 
for this tree are identical to that of Figure 2, and it has a length of 351 steps. PNG haplotypes are indicated by solid upright triangles, with 
the numbering of types matching the nomenclature of Stoneking et al. (1990). A new haplotype group (U) is formed by the PNG highland 
mtDNAs (types 11, 25-45), but otherwise the original branching structure is preserved. PNG types with asterisks signify mtDNAs with the 
9-bp intergenic deletion. 
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type P68 of the southern coast of New Guinea and 
related group G haplotypes 89, 94, 106, 109, and 119 
which share the Hhal site loss at np 7598 and 
are similar to PNG types P67—P69 (STONEKING ét al. 
1990). Thus, it appears that both the Malay peninsular 
and Borneo (Sabah) populations retain remnants of 
the Austronesian migration that expanded into the 
Pacific Basin and coastal PNG. 

Certain Southeast Asian mtDNAs also show affini- 
ties with those of the PNG highlands. Some Sabah 
Aboriginal mtDNAs share the combined Ddel site gain 
at np 8569 and HaellI np 8572 site loss with PNG 
highlanders. The Malay peninsular populations share 
a Ddel site loss at np 1715 with the PNG highlands, 
and Vietnamese haplotype 40 shares site gains at nps 
15882 (Avall), 10394 (Ddel) and 10397 (Alul) with 
PNG type P37. However, these populations often lack 
other mutations which may be associated with site 
losses or gains, making clear associations between 
Borneo (Sabah), the Malay peninsula, and highland 
PNG difficult. 

Alul/DdeI np 10397/10394 sites: The overlapping 
Alul and Ddel sites at nps 10397 and 10394 appear 
to be ancient mutations. This pair of sites was preva- 
lent in every Southeast Asian population and divided 
each of them into two major groups (Figures 2 and 
3). The Ddel site has been found in mtDNAs from 
every racial group (CANN, STONEKING and WILSON 
1987; BROWN et al. 1992), and is present in the most 
divergent African haplotypes reported (CANN, STO- 
NEKING and WILSON 1987), indicating its antiquity. 
The Alul site has not been previously reported, but 
correlates highly with a reported Alul site at np 1403 
which is consistently associated with the Ddel site at 
np 10394 (CANN, STONEKING and WILSON 1987; STo- 
NEKING é¢ al. 1990). It seems likely that the putative 
Alul site at np 1403 was previously misplaced, and is 
in fact at np 10397. If this is the case, these sites also 
subdivide the PNG mtDNAs into two major groups 
(STONEKING et al. 1990) indicative of a common Mon- 
goloid origin. 

COII/tRNA™ length polymorphism: The COII/ 
tRNA deletion appears to have originated in a 
mtDNA similar to haplotype 54, probably in central 
China. As migrating populations radiated out from 
this region successive founder events then resulted in 
the increased frequency of the deletion haplotypes in 
some populations. Today the deletion is distributed 
among Pacific coastal or island populations (HORAI 
and MATSUNAGA 1986; CANN, STONEKING and WIL- 
SON 1987; HERTZBERG et al. 1989; STONEKING et 
al.1990) as well as the Amerindians (SCHURR ¢¢ al. 
1990; ToRRONI et al. 1992) (Table 2). 

The deletion appears to have been associated with 
at least two major migrations. One migration moved 
south along the Asian coastline, eastward into Indo- 


nesia, and out into the Pacific islands (HERTZBERG et 
al. 1989). The other migration went north into Siberia 
and eventually crossed the Bering land bridge into 
the New World, yielding the Amerindians (SCHURR et 
al. 1990). 

Subsequent nucleotide substitutions occurring in 
ancestral mtDNA haplotypes have resulted in two 
distinct clades creating haplotype groups C and D* 
(Figure 2). The Ddel site at np 10394 is present in 
haplotype group C but absent in D*. In addition, 
deletion haplotypes within group C appear more di- 
vergent (0.089%) than those of haplotype group D* 
(0.067%). Consequently, the deletion haplotypes 
within group C appear to be older than that of hap- 
lotype group D*. Haplotype group D* consists of 
haplotype 54 and closely associated haplotypes. All 
the deletion haplotypes seen in aborigines from PNG 
(STONEKING et al. 1990) and Amerindians (SCHURR et 
al. 1990) fall within haplotype group D*. Thus, it 
would appear that the recent migrants from Asia that 
carried the COII/tRNA‘” deletion belonged to hap- 
lotype group D*. 

While groups C and D* are probably derived from 
a single COII/tRNA’™ deletion event, deletions asso- 
ciated with two other haplotypes probably are the 
result of independent events. Haplotypes 43 in a Vi- 
etnamese and 21 in a Malaysian Chinese do not fit 
into either group C or D* (Figure 2). Both differ by 
at least 6 mutational events from the next closest 
deletion haplotype, 54. Haplotype 21 has very little 
similarity to any other haplotype in our study. In 
contrast, 43 shares all the same restriction sites with 
haplotype 37 (haplotype group B) which does not have 
the deletion. To directly verify the presence of the 
deletion in haplotype 43, we sequenced the mtDNA 
through region V and found the reported deletion. 

The marked differences in haplotypes 43 and 21 
relative to each other and groups C and D* can best 
be explained by parallel] independent deletions. To 
further test this possibility, we tried to weight the 
deletion in phylogenetic analyses and thereby force 
the tree to assume a structure in which the deletion 
could arise only once. This resulted in increasing the 
tree length by 12 additional steps (data not shown). 
Consequently, it appears that there must have been 
three deletion events, one in a haplotype similar to 54 
giving the major deletion clades, one in a haplotype 
similar to 37 to create haplotype 43, and one in an 
undefined haplotype to yield haplotype 21. Deletion 
haplotypes 58 and 61 may also have occurred inde- 
pendently, or alternatively, 58 may be a derivative of 
54, and haplotype 61 a derivative of 54 via parallel 
site losses at np 12406 (HinclIl/Hpal) and np 16517 
(Haelll). 

The two insertion mutations may also have had 
independent origins since they are associated with 
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very different haplotypes, 34 and 120. Our haplotype 
120 and CANN’s insertion type 92 (CANN 1982; CANN, 
STONEKING and WILSON 1987) are similar to haplo- 
type group A. However, haplotype 34 is quite distinct, 
retaining three phylogenetically important sites 
(Hincll and Hpal np 12406, Haell np 9052, and Avall 
np 8249) and five other point mutations creating 
unique restriction sites. 

In summary, all Southeast Asian populations ana- 
lyzed in this study appear to have common origins, 
consistent with a hypothesized southern Mongoloid 
origin of the peoples in this region (BELLwooD 1985, 
and references therein; TURNER 1987). These 
mtDNAs are divided into two major branches by the 
Alul/Ddel nps 10397/10394 polymorphisms. The 
populations from the Malay peninsula and Borneo 
(Sabah) appear to have genetic ties to those of coastal 
PNG. The high sequence diversity of the Vietnamese 
and the high frequency of the HincII/Hpal morph 1 
haplotypes suggest that Southern China is the center 
of Asian mtDNA radiation (BLANC et al. 1983) and, it 
appears that the deletion and insertion mutations have 
occurred multiple times in Asian mtDNA lineages. 
The high frequencies of the deletion haplotype group 
D* mtDNAs in Southeast Asia, the Pacific islands, 
and the New World implies that the migrants carrying 
this marker were descendant from a single founder 
population. 
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APPENDIX A 


The oligonucleotide primers for PCR amplication 
of Southeast Asian mtDNAs are given in Table 4. 


TABLE 4 


Oligonucleotide primers for PCR amplifications of Southeast 
Asian mtDNAs 


5’ — 3’ coordinates 


(forward, reverse) Tu 
1562-1581, 3717-3701 51 
3007-3023, 5917-5898 55 
5317-5333, 7608-7588 57 
7392-7410, 8921-8902 57 
8282-8305, 10107-10088 57 
9911-9932, 11873-11851 69 

11673-11691, 13950-13932 57 
13914-13930, 16547-16527 47 
16453-16472, 1696-1677 61 


Primer pair coordinates are positioned according to ANDERSON 
et al. (1981). The coordinates before the comma correspond to the 
forward primer and those after the comma to the reverse primer. 
The Ty used for annealing was the lowest for the primer pair as 
calculated from the nucleotide sequence of each primer, Ty = 4(C 
+ G)+2(T + A)— 5°. 


APPENDIX B 


Figure 4 presents the polymorphic restriction sites 
observed in Southeast Asian mtDNA haplotypes. 
mtDNA haplotypes are numbered according to Table 
1. A “1” indicates the presence of a site and a “0” 
indicates the absence of a site except for region V 
where “1” indicates a single copy of the 9-bp repeat, 
“2” indicates two copies of the repeat, and “3” indi- 
cates the presence of the 4-bp insertion. Sites are 
numbered from the first nucleotide of the recognition 
sequence according to the published sequence (AN- 
DERSON et al. 1981); bold face numbers indicate site 
gains relative to the published sequence and non-bold 
face numbers indicate site losses. The 18 restriction 
enzymes are designated by the following single-letter 
code: Alul, a; Avall, b; Ddel, c; HaellIl, e; Hhal, f; 
Hinf I, g; Hpal, h; Hpall, i; Mbol, j; Rsal, k; Taql, 1; 
BamHI, m; Haell, n; Hincll, 0; Pstl, p; Puull, q; Xbal, 
r; Xhol, s (after CANN and WILson 1984; CANN, 
STONEKING and WILSON 1987). Sites separated by a 
diagonal line indicate either simultaneous site gains 
or site loss for two different enzymes or a site gain for 
one enzyme anda site loss for another enzyme because 
of a single inferred nucleotide substitution; these sites 
are considered to be only one restriction site poly- 
morphism in the analysis. Sites marked with an asterisk 
were found to be present or absent in all samples 
(except where polymorphic) contrary to the published 
sequence and were confirmed by sequencing (WAL- 
LACE et al. 1988; J. BROWN et al. 1992). 


Site 
1253 
160f£ 
207h/ 2070 
316g 
663e 


9513 
1002q/10040 
1063e 
1193r 
14131 
1476k 
1667c 
1715c¢ 
23493 
29871 
30903 
3123k 
3192c 
3315e 
3359, 
3534¢/3537a 
36580 
3569j 
38681 
43281 
4563e 
4685a 
47111 
4735k 
*4769a 
4830n/4831f 
4848e 
4990a 
5054k 
5072g 
5076c 
5176a 
5351f£ 
53893 
5552¢ 
5584a 
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FIGURE 4.—Polymorphic restriction sites observed in Southeast Asian mtDNA haplotypes. 
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Site 
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